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TESTING FOR CHANGE POINTS IN TIME SERIES MODELS 
AND LIMITING THEOREMS FOR NED SEQUENCES 1 

By Shiqing Ling 

Hong Kong University of Science and Technology 

This paper first establishes a strong law of large numbers and a 
strong invariance principle for forward and backward sums of near- 
epoch dependent sequences. Using these limiting theorems, we de- 
velop a general asymptotic theory on the Wald test for change points 
in a general class of time series models under the no change-point hy- 
pothesis. As an application, we verify our assumptions for the long- 
memory fractional ARIMA model. 

1. Introduction. Testing on structural change problems has been an im- 
portant issue in statistics. The earliest references go back to Chow [9] and 
Quandt [33] . Chow's test is to assume that the time of structural change is 
known a priori, and the critical values for the y 2 distribution can be simply 
used. Quandt's test is to take the largest Chow test statistic over all possible 
times of the structural change. Quandt's test appears to be more reasonable 
in practice because it does not need to assume the time of structural change 
a priori. However, its critical values are hard to obtain even approximately 
due to singular behavior near the end points. One method is to restrict the 
change-point interval (0,1) to [ri,Ta] with < n < T2 < 1; see [[2], [4], [15], 
[16], [22]]. Another important method is to normalize the Quandt-type test. 
This type of test statistic has a Darling-Erdds-type limit and its critical val- 
ues are easily obtained. This method was developed by Yao and Davis [38] 
for i.i.d. normal data, and was extended by Horvath [17] for general i.i.d. 
data and Horvath [18] for linear regression models. However, when using 
this method for time series models, we encounter some great challenges. 

To understand these, we look at the AR(1) model, yt = 4>yt-x + £t, where 
\<j>\ < 1 and {et} are independent and identically distributed (i.i.d.) errors. 
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First, we need to obtain the rate of uniform convergence of the partial sample 
information matrices based on {y±, . . . , y^} and {i/k+i, • • • , y n }, respectively; 
that is, for some 5 > 0, we need to establish 

k 



(a) max 

g n <k<n 



1 



l-<5 



*=i 



Op(l) and (b) max 



g n <k<n 



1 



1-6 



E * 



t=fc+i 



as n — > oo, where k = n — k, g n = logloglog(max{e 6 ,ra}) and _X" t = y t 2 „ 1 — 
Eyt-i (see Lemma 6.2). Under the strong mixing condition with Eyf < oo, 
Davis, Huang and Yao [12] first established that 

k 



(1.1) 



1 



J2 x t = o(l) a.s., 



t=i 



using the strong invariance principle in Kuelbs and Philipp [23], and then 
used (1.1) to obtain (a). We note that the ergodic theorem only ensures that 
J2t=iXt/k = o(l) a.s., which cannot be used for (a), and hence, (1.1) in [12] 
is novel. Since {yt} is strictly stationary, (b) is equivalent, for any e > 0, to 



1 



max 



,g n <k<n fc 1 ^ 



E * 

-k+l+n 



> e | =P\ max — — j 

I \s„<fc<nfc 1 - d 



E^ 



> e 



oil). 



This is not equivalent to (a) if {yt} is not time-reversible. Except for Gaus- 
sian linear processes, very few time series have been shown to be time- 
reversible; see [8]. Thus, (1.1) cannot be used for (b), generally. To solve this 
problem, we need the following strong law of large numbers (SLLN): 

-i 



(1.2) 



1 



Y,X t = o{l) a.s. 



t=-k 

However, this has not been established in the literature. 

Second, we need to approximate the score functions based on the sub- 
samples {yi, ■ ■ ■ ,yk} and {yk+i, ■ ■ ■ ,y n } by i.i.d. normal random sequences 
{Gu - t = 1,2, ...} and {G^t :t = 1,2, ...}, respectively, such that 

k k 

o P (l), 



(c) max k c 



(d) 



g n <k<n 



max k s 

g n <k<n 



It 



OpiX), 



for some 5 > 0. Davis, Huang and Yao [12] first used the result in Kuelbs 
and Philipp [23] to establish the strong invariance principle (SIP), 

(1.3) "TrE^*- 16 * = -7r^2 G u + °( k ~ 5 ) a - s ' 

V k t=l \/k t=1 
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with strong mixing {yt}, for some 5 > 0, and then used (1.3) to prove (c). 
Similarly to (b), to prove (d), we need the backward SIP, that is, there is 
an i.i.d. normal random sequence {G2t '■ t = 1, 2, . . .} such that 

— 1 k 

(1.4) -L J2 yt-ie t = ^J2G 2t + o(k- 5 ) a.s. 

Again, there is not any result for (1.4) in the literature. The preceding 
difficulties are not only in Quandt-type tests but also in the estimating 
change-points as in [[3], [26], [31]]. This issue seems to be not well discussed 
in the literature. 

This paper first establishes a new SLLN and a new SIP for the back- 
ward sums of near-epoch dependent (NED) sequences. The existing SLLN 
and SIP for the forward sums of random sequences related to (a) and (c), 
such as those in [34] and [14], require some mixing and high-order moment 
conditions, or do not have a rate of convergence (see also [25]). The mixing 
conditions are not always easy to verify. The high-order moment condition 
directly links to the restriction on the parameter space in some nonlinear 
time series models such as ARCH- type models. The weakest moment con- 
dition is in the ergodic theorem, but it does not have a rate of convergence. 
This paper next establishes a SLLN and a SIP with a rate of convergence for 
the forward sums of NED sequences under a weak moment condition and 
without a strong mixing assumption. 

Our SLLNs and SIPs are given in Section 2. Using them, we study the 
Wald test for change-points in a class of time series models in Section 3. 
This is a general theory and can be used for many time series models. As an 
application, we verify our assumptions for long-memory FARIMA models 
in Section 4. The proofs are given in Sections 5-7. Throughout this paper, 
we use the following notation: \A\ = [ti(AA')] 1 / 2 for a vector or matrix A 
and \\Z\\ p = (E\Z\ p ) l / p for a random vector or matrix Z with its elements 
in LP space (p> 1). Finally, we refer to the related references [20] and [19] 
for Quandt-type tests with the long-memory time series, and to [24] for the 
sequential approach. 

2. Limiting theorems for NED sequences. Let {et} be a series of inde- 
pendent random variables (or vectors) on the probability space (Q,B,P), 
Tt = cr{et,£t—i, ■ . .} and Xt be a .^-measurable m x 1 random vector for 
t = 0, ±1, .... We first introduce the following definition. 

Definition 2.1. Let be the cr-field generated by {sj,£j-i,..., 

£j-i+i} with % > 1, and F (j) = {0,^}. {X t } is said to be L p (u) NED in 
terms of {e t } if sup_ 00<t<00 \\X t \\ p < oo and sup_ 0O<t<oo || X t - E[X t \T k {t)]\\ p = 
0(k~ u ), where p > 1 and v > 0. 
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This notion of NED sequence extends a concept introduced in Billingsley 
[7]. Some different versions appear in [[30], [32], [36]]. This NED {Xt} implies 
that it is mixingale, that is, sup_ 00<t<00 \\EX t — E(X t \J r t-k)\\p = 0(k~ u ). 
Our SLLN and SIP are as follows. 

Theorem 2.1. Let {X t :t = 0,±1,...} be an L l+L (u) NED and mean 
zero sequence in terms of {et} with i > and v > 0. Then there exists a 
constant 5 > such that 

( a )^E x * = °(^) a - s - and ( b )^E x *=°(^) a - s - 

Remark 2.1. The moment condition in Theorem 2.1 is only slightly 
stronger than that in the ergodic theorem for the forward sums. But our 
SLLN includes a rate of convergence, while the ergodic theorem does not. 
We guess that this is the weakest moment condition for the NED sequence 
if a rate of convergence is wanted. This rate is indispensable when we prove 
Lemmas 6.2-6.4. The independence of {et} can be replaced by some mixing 
conditions. If we allow t > 1 and v > 0.5, then a sharper rate of convergence 
may be obtained; see, for example, [14], page 41. If we assume i = 1 and use 
the moment bound of Ing and Wei [21], then a relationship between the rate 
of convergence and the series dependence can be given. 

Theorem 2.2. Let Xt be a martingale difference in terms of Tt with 
covariance matrix Q, and be L 2+L {y) NED in terms of {et} with i > 0, where 
either 2u > 1 or 2v = 1 and there exist constants v\ > and l\ > with 
1v\ > 1 such that 

(2.1) sup \\E[X t \F k+1 (t)} - E[X t \T k {t)]\\ 2+Ll = O(fc^). 

— oo<t<oo 

Then, without changing its distribution, we can redefine {Xt} on two richer 
probability spaces together with two sequences of i.i.d. m X 1 normal vectors 
with mean zero and covariance matrix £1, {Gu :t= 1,2,...} and {G 2 t ■ t = 
1,2, . . .}, such that, for some constant 5 > 0, we have, respectively, 

k k 

(a) J2 X t = J2 G ±t + 0(k 1 ' 2 ~ & ) a.s. and 
t=i t=i 

-1 k 

(b) J2 X t = Y^G2t + 0{k l l 2 ~ 5 ) a.s. 
t=-k t=i 



Remark 2.2. The two richer probability spaces may be different, for 
which we refer to [6] and [13]. Theorems 2.1-2.2 do not require {X t } to be 
stationary and can be extended for triangular arrays as in [1]. 
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3. Testing change-points in time series models. Assume that the time se- 
ries {yt :t = 0, ±1, ±2, . . .} is .^-measurable, strictly stationary and ergodic, 
and is generated by the model 

(3.1) y t = f(X,Y t _ 1 ,s t ), 

where / is a known function, A is an m x 1 unknown parameter vector, {e^} 
is i.i.d. and Y t = (yt, J/t-i, • • ■)■ The structure of {y t } is characterized by / 
and A. This class of models (3.1) includes many time series models in the 
literature, such as ARMA, GARCH and random coefficient AR models. We 
assume that the parameter space O is a compact subset of R m , and the true 
value of A, denoted by Ao, is an interior point in 0, where R = (—oo, oo). 

We denote the model (3.1) with the true parameter Ao by M(Xq). Let 
yi, . . . , y n be the observations. We consider the null and alternative hypothe- 
ses, 

H : {yt, . . . , y n } G M(A ) versus 

Hi n (k):{yi,...,yk}eM(\o) and {y k+1 , . . . ,y n } £ M(X W ) 

with Ao 7^ Aio for some k E [l,n). 

Here, k = [tit] is called the change-point with r G (0,1), where [x] is the 
integer part of x. Under Hi n (k), we use the following objective functions 
(OF) to estimate Ao and Aio, based on the sample {yi, . . . ,y n } with initial 
value Yq, respectively, 

k n 

(3.2) L n (k,\) = Y J K\Y t ) and L ln (k,X 1 )= £ l(Xi,Y t ), 

t=l t=k+l 

where l(X,Yt) is a measurable function in terms of Yt and is almost surely 
(a.s.) three times differentiable with respect to A. The function /(A, Yj) can be 
taken as those in LSE, MLE, quasi-MLE and M-estimators, among others. 
Let l t (X) = l(X,Y t ), D t (X)=dl t (X)/dX and P t (X) = -d 2 l t (X)/dXdX' . Denote 
X = E[P t (Xo)] and 0, = E[D t (Xo) D' t (Xo)]. Here and below, the expectation is 
with respect to the probability measure under the null hypothesis. We first 
give two sets of assumptions as follows. 

Assumption 3.1. For some constant l > and an open neighborhood 
©o of A : 

(i) -Esup^gg, |^(A)| 1+t < oo and E[lt(X)] has a unique maximum at A = 

Xq] 

(ii) Dt(Xo) is an .^-measurable martingale difference with Q > 0; 

(iii) S > and £sup A66o ^(A)! 1 ^ < oo; 

(iv) E'sup Ageo \dpijt(X)/dX\ - 5+l - < oo, where Pijt(X) is the (i,j)th ele- 
ment of -Pt(A). 
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Assumption 3.2. For some t > 0, u > and v>\/2 and an open 
neighborhood Oo of Ao: 

(i) || sup Ae0 \l t {\) - E[l t {X)\F k {t)]\\\ 1+i = 0(k-»°); 

(ii) A(A ) is L 2+L (v) NED in terms of {e t } with 2v > 1 or with 2v = 1 
and (2.1) being satisfied as X t = A(Ao); 

(iii) || sup Ae0o |P t (A) - E[P t (\)\F k (t)]\\\ 1+l = 0(k-»°). 

When l = 0, Assumption 3.1(i)-(iii) is typical for estimating Ao in model 
(3.1). We need the (1 + t)th finite moment here because the ergodic theorem 
cannot be used for Xi n (k). Assumption 3.1(iv) is for the rate of uniform 
convergence in (6.2). Assumption 3.2 is a key to using Theorems 2.1-2.2 for 
Lemmas 6.1-6.2. In practice, Yq is usually replaced by some constants. Let 
lt(X), A (A) and -Pi (A) be defined as /*(A), A (A) and Pt(A), respectively, 
with initial values yt being zero or a constant for t < 0. Our initial condition 
is as follows. 



Assumption 3.3. For some constants 5 > 0, u > and v>\/2 and an 
open neighborhood Oo of Ao: 

(i) E S up X€e \l t (X)-l t (X)\=0(t-»°); 

(ii) sup 5n < fc < n {A;- 1 / 2 + 5 |Ei=i[A(Ao)-A(Ao)]|} = o p (l) and sup ffn < n _ fc<n 
{(„ _ fc)-V2+<S| X)tLfc+i[A(Ao) - A(A )]|} = o p (l); 

(iii) Ssup Aee0 |Pt(A)-P t (A)| = O(r^) and || A(A ) - A(A )||i+ t = 0(«t), 
where g n = logloglog(max{e e ,ra}) and = t~ u log q t for some g > 0. 



It can be shown that Assumption 3.3(h) holds if 2v > 1 in the second part 
of Assumption 3.3 (iii) . The OFs in (3.2) are modified as 

k n 

(3.3) I«(M)=£f t (A) and A n (Mi) = E [ *( A i)- 

t=i t=k+i 

Let A n (/c) and Ai n (fc), respectively, be the maximizers of L n (k, A) and L\ n (k, 
Ai) on O for each known k. The Wald test statistic evaluated at [X n (k), 
Xi n (k)] for testing Po against H\ n (k) is defined as 

W n (k) = fc(ra 7 fc) [X n (k) - Ai n (fc)]'[E n (A;)A- 1 (fc)E ri (fe)][A n (A ; ) - A ln (fc)], 

where £„(*;) = Et=i Pt{k(k)) + £? =fc+1 A(Ai B (fe)) and 

k n 

n n (k) = Y / D t CXn(k))D' t (Xn(k))+ £ A(Ain(^))A(A ln (/c)). 
t=l t=fc+l 
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When we test the null Hq against Ufce[i,n) H\ n {k), a natural test statistic 
is maxfc e n n ) W n (k). However, this test statistic diverges to infinity; see [2]. 
We define the normalized Quandt-type Wald test statistic as 

(3.4) W n {m)= max — , 

m<k<n—m a n [m) 

where a n (m) = ■^/b n (m)/(2 log log n) , b n {m) = [2 log log n + (m log log log n)j 
2 — logr(m/2)] 2 /(21oglogn) and T(-) is the gamma function. Our result for 
testing for a change-point in model (3.1) is as follows. 

Theorem 3.1. If Assumptions 3.1-3.3 are satisfied, then under the null 
Hq, for any x £ R, P[W n (m) < x] — > exp(—2e~ x ^ 2 ) as oo. 

Remark 3.1. Some weighted test statistics can be constructed along the 
lines of [11] where optimality of related tests is also discussed. Assumptions 
3.1(1) (iii) and 3.2-3.3 were verified by Ling [27] for the AR-GARCH model. 

4. Application to LM-FARIMA models. The time series {yt} is said to 
follow a long-memory FARIMA(p, d, q) model if 

(4.1) 4>{B){\ -B) d y t = 4,(B)e t , 

where <f>(B) = 1 - E?=i0i-B i , ij){B) = 1 + EjLi^i- 8 *) B is the backward- 
shift operator, d e (0,0.5) and (1 - B) d = J2'k=o c kB k with c k = (-d){-d + 
1) • ■ • (— d + k — l)/k\, and {et} a sequence of i.i.d. white noise variables. 
Denote A = (d, <j)\, . . . , <j) p , . . . , ip q )' ■ The parameter space is a compact 
subset of R p+q+l . Assume that the true parameter Ao of A is an interior 
point in G and, for each A S 0, it satisfies: 

Assumption 4.1. de (0,0.5), 4>(z) / and ip(z) / for all z such that 
\z\ (ftp 7^ 0, ipg 7^ 0, and <p(z) and ip{z) have no common root. 

It is not hard to see that (4.1) is a special form of model (3.1). Following 
common practice, we use quasi-log-likelihood estimation for Ao and the OFs 
are 

k n 

(4.2) L n (k,\) = -±J2 £ t( x ) and WMi) = -§ E £ t(M), 

t=l t=k+l 

where e t (X) = ^{B^B^l - B) d y t . In this case, we have 

detW f . de t (\)de t (X) d 2 e t {\) . . 

D t (\) = —g r e t (\) and P t (X) = - — + ^r^(A). 

Let X n (k) and X\ n (k) be the maximizers of L n (k, A) and L\ n (k, Ai) on for 
each k with the initial values yt = for t < 0. The result for model (4.1) is 
as follows. 
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Theorem 4.1. If Assumption 4.1 holds and E\st\ 2+L < oo for some i > 
0, then under the null Hq, for any x, P[W n (p + q + 1) < x] — > exp(— 2e~ x / 2 ) 
as ra — > oo . 

Remark 4.1. For the linear processes with the long- memory parameter 
H = 1/2 + do, Beran and Terrin [5] and Horvath and Shao [20] proposed 
some tests for the change of H in the frequency domain, but they did not 
verify the conditions for model (4.1) and assumed that -E|et| 4+t < oo. See 
also [19] . As far as we know, our test statistic is new in the time domain and 
is also different from the tests in [5] and [20]. 

Remark 4.2. To see the performance of the Wald test in finite samples, 
we examine a small simulation for the FARIMA(0, d, 0) model with St ~ 
N(0, 1), using Fortran 77. Sample sizes n = 250 and 400 are used. We first 
study the size, for which we take do = 0.1,0.2,0.3 and 0.4, and then the 
power, for which we take do = 0.1 and di$ = 0.2,0.3,0.4 with the change- 
point k = [0.5n] and [0.9n], respectively. The results at the 0.1, 0.05 and 
0.01 significance levels are reported in Table 1. When n = 250, the size is 
very close to the nominal 0.01 level and is acceptable at the nominal 0.05 
level, but is quite conservative at the nominal 0.1 level. When n is increased 
to 400, all size values are close to the nominal levels. Power increases when 
n increases from 250 to 400. When k = [0.9n], the power is lower than when 
k = [0.5n]. We also have the simulation results when n = 200. But in this 
case, all size values are small and power is very low, and hence, they are not 
reported here. 

5. Proofs of Theorems 2.1 and 2.2. This section gives the proofs of The- 
orems 2.1 and 2.2. 

PROOF of Theorem 2.1. Let S n = YJt=i x t and p = 1 + i. From K = 
1,2,..., let Z = [VK] and define A t . K = || £*lipQ+j - E(X t+j \^(t + j))]\\ p 
and B t>K = \\Y;f =1 E(Xt +j \Fi(t + j))\\ p . Since E{X t+j \F L {t + j)) are /-de- 
pendent and the L p {v) NED assumption holds, it can be readily shown that 
there is a constant a > such that A^k + Bt t K — 0{K l ~ a ) uniformly in t. 
So, || Y.?=iXt+t\\ P = 0{K^~ a ^). By Proposition 1 of [37], we have 

max | St 

l<t<2 k 

Thus, for some < p < 1, || max 1<K2 t \St\\\ P = 0(2 kp ), from which (a) and 
(b) follow easily. □ 



< 



E 



Eil« 



2 r i — SV(i-l) 



1/p 
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Table 1 

Size and power ofW n (l) for testing change-point in FARIMA(0,d,0) models (1000 

replications) 







n = 250 






n = 400 




10% 


5% 


1% 


10% 


5% 


1% 


do 




Sizes 










0.1 


0.055 


0.039 


0.012 


0.081 


0.049 


0.015 


0.2 


0.059 


0.037 


0.012 


0.083 


0.046 


0.014 


0.3 


0.064 


0.038 


0.010 


0.078 


0.047 


0.012 


0.4 


0.050 


0.031 


0.010 


0.077 


0.041 


0.014 


dio 


Power when do = 0.1 and k 


= [0.5n] 








0.2 


0.168 


0.120 


0.040 


0.304 


0.235 


0.126 


0.3 


0.333 


0.260 


0.114 


0.655 


0.566 


0.403 


0.4 


0.658 


0.571 


0.387 


0.924 


0.886 


0.791 


dio 


Power when do = 0.1 and k 


= [0.9n] 








0.2 


0.135 


0.089 


0.022 


0.180 


0.114 


0.056 


0.3 


0.181 


0.124 


0.056 


0.303 


0.225 


0.106 


0.4 


0.424 


0.334 


0.197 


0.582 


0.498 


0.312 



To prove Theorem 2.2, we need the following lemma which is used for 
(5.2), (5.7) and (5.10). 



Lemma 5.1. Let Xt be defined as in Theorem 2.2. Then (a) 



£ {X t -E[X t \T t -i{t)]} 

t=i+l 



2+J 



t=i+ 



1 



\2v 



1/2 



O(l) 



uniformly in j and i <j, and (b) furthermore, if (2.1) holds, then we have 



Y^{E[X^ t+3 {-t)]-E[X^ t+3 ^{-t)]} 



t=i 



O(l). 



2+ti 



PROOF. Let p = 2 + t and £ t)k = X t - E[X t \F h {t)] for k > 0. Since X t is 
an ^-measurable martingale difference and {st} is independent, we know 
that £tt—i is an -^i-measurable martingale difference. By Definition 2.1, 
sup^-supi^tll^t-iUp^-i)"] = mp i<j sup i<t < j suv < k<x (\\£ tjk \\ p k 1 ') < sup 
0<fc<ooSup_ oo<t<oo (||^ ifc ||p^) = 0(1). By Burkholder's inequality in [10], 
page 384, there exists a constant B, depending only on i, such that 



E 



t=i+l 



<be(Y, \Ct,t-i\ 

\t=i+l 



P/2 / j 

\t=i_l_l 



p/2 



2 

t— i lip 
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where the last step uses Minkowski's inequality. Thus, (a) holds. Since 
E[X_ t \ F-t+ji—t)] — E[X-t\F—t+j—i{— t)] is an ./^-measurable martingale 
difference and 2z/i > 1, similarly, we can prove that (b) holds. □ 

Proof of Theorem 2.2. By Theorem 2 in [13], the proof of (a) is 
much easier than that of (b). So, only the latter is presented here. 

Let X^f = E[X t \F- i+1 (0)] - E[Xt\F-i(0)\ for i < -1. Note that E(X t \ 
J--t(0)) = EXt = when t < — 1. We have the decomposition 



-i 



-i 



-i 



(5.1) 



J2 X t = ]T {Xt - E[X t \F k+1 (0)}} + E E X ti 

= E[x t \T k+1 m} + E E4? 

t=—k i=—k t=i 



k j 



£ {X t - E[X t \F k+1 (0)]} + J2E X ~l 



t=-k 



j=lt=l 



Note that £ , [X t |jF fc+1 (0)] = £[X t |.F t+fe+ i(£)] when t < and t + k + 1 > 0. 
By Lemma 5.1(a), || E t =_ fe PQ - E[X t \F k+1 (0))}\\ 2+i = 0(E^- fc (< + k + 
l)-2^l/2) = 0[(J2t=it~ 2u ) 1/2 }- Thus, by the Cauchy-Schwarz inequality, for 
any e > 0, we have 



P\ max 

\ Kk 



-1 



oo 

<E^ 

k=l 



> e 



^5 £{X t -£[X t |^ +1 ( )]} 

1 -1 

X){^-^t|^fc+i(0)]} 



fc l/2-<5 



t=-fe 



> e 



(5.2) 



< 



1 



^2+ 



rE 



i 



fc=Z 



fc (l/2-5)(2+0 



J2{Xt-E[X t \T k+1 (0)]} 



t=-k 



2+t 



< 



O(l) 



k 1 \ i+t/ 2 



2+t Z^ ^(1/2-25) (2+1.) ^2v+2<5 J 



o 



1 



j(l/2-M)(2+t)-l 



as (5 > is small enough such that (1/2 — 2<5)(2 + i) > 1. By Lemma 1 in [10], 
page 31 and (5.2), 



(5.3) 



£ {^- J B[X t |J- fc+1 (0)]} = O(A: 1 / 2 - 5 ) a.s. 



t=-k 
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The second term in (5.1) can be rewritten as 
fc+i i-i 



£Y 0J and y i = E^ (0) 



t-j+v 



i=2 



t=i 



By (5.3), it is sufficient for (b) to show that we can define a sequence of i.i.d. 
m x 1 normal vectors {G^} with mean zero and covariance such that 



(5.4) 



X;Vo 3 -eg,=o(^) 

3=2 j=l 



a.s. 



Since X^ ] t _ j+l G ^(0), we know that Y 0j G ^(0) and E(F j|^}-i(0)) = 0. 
Thus, {Yqj , (0) , j = 1,2,...} is a sequence of forward martingale differ- 
ences. Using the strong invariance principle in [13], Theorem 1, it is sufficient 
for (5.4) to verify the following conditions: 

(i) there exists an Z> such that £'|loj| 2+! ' < M, a constant, uniformly 
in j; 

(ii) for some 5 > 0, the following holds uniformly in s: 

1 s+n 

(5-5) — £ [£(Y ^,)-O]=O(l), 

s+n 

E [E(Y Oj Y{ j \F 8 (0))-E(Y Oj YZ j )] 

j=8+l 



(5.6) 



1 



E 



0(1) 



Note that X { °}_ j+1 = E[X-t\Fj(0)]-E[X-t\Fj-i(0)] and E[X_ t |^(0)] = 
E[X- t \F-t+j(-t)] when i > 1 and -t + j> 0. When 2v > 1, by Minkowski's 
inequality and Lemma 5.1(a), it follows that, uniformly in j, 

2+i 



E\Yr 



0j 



|2+t 



3-1 
J2 X 



(0) 



t=l 



(5.7) 



< 0(l)E 



3-1 



J2{X^t-E[X^(0)]} 



t=l 



2+t 



+0(1)E 



3-1 



E{^-t-^[^-*i^-i(o)]} 



/=i 



2+t 



O(l). 



That is, (i) holds. When 2u = 1 and (2.1) is satisfied, (i) holds by Lemma 
5.1(b). 

For (ii), we make a decomposition as 



fs-l 



YojY oj 



/s-l 



(0) 

-t-j+1 
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P- 1 



(0) 



(5.8) 



,t=s 

/i-i 



-i 



(0) 



L \t=s 
/a-1 



\t=l 



-Bijs + -B2js + B 3 j s . 



a= 



We first show that 



s+n 



(5.9) 



^ E (EB 2js -n) = o(l), 



3=8+1 



uniformly in s. Let Z t j = E[X- t \Fj(0)]- Since E(X { ®}_ j+l \F^i) = 0, 

s+n s+n j—1 

E E E^-i +1 ^-i + i) 

j=s+l j=s+l t=s 

s+n j—1 s+n j-l 

= E E^A)- E E^-i^-i) 

j=S+l t=S j = s + l t=s 

s+n j-l s+n-1 j 

= E E^A)- E E^A) 

3=s+l t=s j=s t=s 

s+n— 1 n+s—1 

= ^2 E ( z t,s+nZ' tiS+n ) - E(Z SjS Z' s s ) - ^ E ( Z j,j Z j,j) 

t=S j = s + l 

s+n— 1 

= E E ( z t,s+n z t, s +n)-> 

t=s 

where Zjj = E(X_j\J r j(0)) = is used since X_j is independent of .Fj(O). 
Since Z t j = E[X-t\ ^-t+j{— t)], by the near-epoch dependence of X t , 
s+n— 1 

E E ( Z t,s+n Z 't,s+n ~ X -tX'_ t ) 
t=s 



n 



1-5 



< 



1 



1-5 



J? 



s+n— 1 s+n— 1 

E E\X. t -Z ttS+n \ 2 + 2 E(\X^ t -Z ttS+n \\X_ t \) 

t=S t=S 



< o 



1 



n 



1-5 



s+n— 1 

E 

L t=s 



1 



s+n— 1 



(n + s-t) 



2v 



+ (ElX.t-Z^ElX^l 2 ) 1 / 2 



t=s 



o 
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71 



l-<5 



£ 

t=s 



1 



(n + s-i) 



o(l), 



for 8 < v, where O(l) and o(l) hold uniformly in s. Thus, (5.9) holds. 

Since X_J_j +1 is an JF-^-measurable martingale difference, we haveE\B\j s \ < 

E \ Et=l x -l-j+i\ 2 = Et=i E \ X -l-j+i\ 2 - when 2i/ > 1, by the near-epoch 
dependence of Xt, it follows that, uniformly in j > s, 



s-l 



£ E\X ( -l- J+ i\ 2 < O(l) £{£|X_ 4 - Ztj\* + E \ X ~t ~ Zt,-i\ 2 } 



s-l 



1=1 



1=1 

s-l 



< 



0(1) £ 



2u ' 



When 2u = 1, by (2.1) and Minkowski's inequality, ^xl^ . +1 | 2 < 0(l)(j - 
t)~ 2ui and 2u 1 >l. Letting v = 2v or 2i/i according as 2u > 1 or 2f = 1 , we 
have, for < 8 < (y - l)/2, 



0(1) 



-n s— 1 



^rza £ ^l^iisl <^iza E E 



1 



i=s+l 



J=S+1 t= 

+n 



(5.10) 



1 



= o(i), 



a (i " *)* 
1 



£ 



i=s+ i (i - s) 1 - 2 ^ 5 s (- - tf- & 



uniformly in s as n — > oo, for 25 < 5 < v — 1, where we have used j — s < 
min{j — t,n} and j — t > s — t. By (5.9)-(5.10) and the Cauchy-Schwarz 
inequality, we can show that 



n+s 



(5.11) 



ir 



lj £ S|5 3js | = 0(l). 



By (5.8)-(5.11), we can establish (5.5). Since B^js G -^-s is independent 
of F s (0) when j > s, E[B 2js \F s (0)] = EB 2js . By (5.8)-(5.11), uniformly in 
s, we have 



(5.12) 



1 



1-8 



E 



n 



n+s 



J2 {E[Y oj Y^\T s (0)]-n} 

j=s+l 



0(1). 



By (5.5) and (5.12), (5.6) holds. □ 
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6. Proof of Theorem 3.1. We first present three lemmas. Lemma 6.1 
comes directly from Theorem 2.2, while Lemma 6.2 can be proved by using 
Theorem 2.1 and the details are given in [28]. 

Lemma 6.1. If Assumptions 3.1(h) and 3.2(h) hold, then in the sense 
of Theorem 2.2, we can define i.i.d. m x 1 normal vector sequences, {Gu} 
and {G2t}, with mean zero and covariance fi such that, for some 5 > 0, 

= o P (l), 
= o p (l). 



(a) max k 

g n <k<n 



Vk 



EG 

i=i 



it 



(b) max k 

g n <k<n 



i= E A(A ) 4E^ 



Vk t i 



Lemma 6.2. Let @o(k,rj) = {X:k L \X — Xq\ < rf\. Suppose Assump- 
tions 3.1(iii)-(iv), 3.2(iii) and 3.3 (iii) hold. For any e > 0, (1) if 1=0, then 
there is n > such that the following (a)-(b) hold with 5 = 0, and (2) if 
I > 0, then there is 5 > such that the following (a)-(b) hold for any fixed 
i] > 0: 



lim P\ max max — — t 



f=i 



> e 



0, 



(b) lim P\ max 



1 



max 



g n <n-k<n@ (n—k,r)) (n — k) 



1-5 



t=k+l 



> e 



Lemma 6.3. If the assumptions of Theorem 3.1 hold, then there exists 
a 5 > such that X n (k) and Xi n (k) have the uniform expansions 

—l ^ 

Vk[X n (k)-X ]-^ 7F Y,D t (X ) 



(a) max k 

g n <k<n 



(b) max (n — k) 

g n <n—k<n 



Vk t=1 
V n - k[Xi n (k) - X ] 



Op(l), 



\Jn — k 



E A(A ) 



t=k+l 



o P (l) 



Proof. We only prove part (b). By Lemma A. 1(b) in the Appendix, 



P[ max \Xi n (k) — Xq\ > e 

\g n <n—k<n 



P\\X ln (k)-X \>e, E Hhn(k))-l n (X )]>0, 

{ t=k+l 



for some k 6 [1, n — g n ] 
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< P{ max sup V [! t {\) - l t (X )} > = o(l), 

[9n<n-fe<n| A _ Ao | >et=fc+1 J 



for any e > and as n —> oo. Thus, 



(6.1) 



max |Ai n (fc) - A | =o p (l). 

g n <n—k<n 



Using Taylor's expansion for each element of dL± n (k, Xi n {k))/dX = 0, we 
have 

\ -i 



(6.2) Xm(k) - A 



1 



n — k 



E K 



1 



t=k+l 



n—k 



E A(A 



oj 



t=fc+i 



for each fe, where the ith row of P* t is the ith row of ft(A*„ (A;)) for some 
A*„ (&) such that |A*^ (fc) — Ao| < |Ai ra (/c) — Ao| for i = 1,. .. ,m. Observing 
that Dt(Xo) is strictly stationary, by Lemma 6.1(b), the law of iterated 
logarithm (LIL) and Assumption 3.3(h), it follows that, for any Sq > 0, 



(6.3) 



max 

g n <n—k<n 



Op(l). 



Let 5i G (0, 1/2). By Lemma 6.2(b) with I = and (6.1)-(6.3), 



(6.4) 



max \(n-k) Sl [X ln (k)-X ]\ = o p (l). 

g n <n—k<n 



By (6.2)-(6.4) and Lemma 6.2(b) with 1= Si, there exists some 5 > such 
that 



max (n — k) 

g n <n—k<n 



max 

g n <n—k<n 



Vn-k[Xi n (k) - A ] 



y-1 « 



t=fc+l 



1 



n—k 



E ^ 



t=fc+i 



(n — /c) 



1 

1-2(5 



E ( S " ^ 



7 ; U/2+5 E A(Ao] 



Op(l). 



Furthermore, by Assumption 3.3(h), (b) holds. □ 

We further need two lemmas, which are directly used for Theorem 3.1. 



Lemma 6.4. Under the assumptions of Theorem 3.1, it follows that 
(a) max \W n (k) - S n (k)\ = o p (l) and (b) max W n (k) = O p (g n ), 
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where Tl n = [logn, n — logn] and 

2 



Sn(k) 



+ 



\Jn — k 



E A(A ) 



t=&+i 

Proof, (a) By Lemma 6.3, we have 

E 



71 



(6.5) 



(6.6) max 

fcen„ 



max 
fcgn„ 



V^[An(A;) - A ] 



Vn - fc[Ai„(fc) - Ao] 



\Jn—k 



TrEA(Ao) 

n 

E A(Ao) 



t=fc+i 



O p (log- s n), 



OJlog- s n), 



for some 5 > 0. As for (6.3), by Lemma 6.1 and the LIL, we can show that 
(6.7) max \Vk[X n (k)-X }\=O p [(loglogn)^ 2 ], 



(6.8) 



fcen„ 

maxlv 7 " - k[X ln (k) - A ]| = OJ(loglogn) 1/2 ]. 
fcen n 



By Lemma 6.2(a)— (b) and Lemma 6.3, we know that maxj.gn„ |E n 

E| = O p (n~ s ). By Lemma A. 2 in the Appendix, we have maxfc g n„ \&n(k) 

jn - fi| = O p (n"' 5 ). Furthermore, by (6.7)-(6.8), it follows that 

k(n — k) » 



(6.9) max 

fcen„ 



W„(fc) 



7? 



■[A„(A;)-Ai n (fc)]'no[An(fc)-Ai n (fc)] 



Op(l). 



where = Efi E. Denote 



/c(n — /c) 



7?. 



EA(Ao) 



1 



t=i 



n—k 



E A (Ac 



i=fe+l 



By (6.5)-(6.6), we have 



(6.10) max 

fcgn„ 



k) -[K(k)-^m(k))-^- l Uk) 



11 



O p (\og- 5 n). 



By (6.7)-(6.8) and (6.10), it follows that 



(6.11) 



max 
fcen„ 



k(n — k) 



7? 



[AnW-AmCAOl-E- 1 ^) fi 



fc(n *%„(£) -A ln (fc)] 



7? 



:Op(l). 



By (6.9) and (6.11), we can show that maxj. e n„ |Wn(fc) — £^(/c)f2 _1 £ n (£;)| = 
o p (l). By direct calculation, we have S n (k) = £^(/c)f2 _1 £ n (/c). Thus, (a) 
holds. The proof of (b) is easy and can be found in [28]. □ 
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Lemma 6.5. Let {Gt,t = 1, 2, . . .} be an i.i.d. sequence of m x 1 random 
vectors with EG t = and E(G t G' t ) = I. If E\G t \ 2+L < oo for some i > 0, 
then, for each fj, £ (0, 1) and for any x, 

2 



P 



1 



max 

logn<fc<M n 



1 



Mm) 



<x — expt-e"*/ 2 ) 



a n (m) 

as oo, where a n (m) and b n (m) are defined in Theorem 3.1. 

Proof. The lemma can be proved readily by using Lemma 2.2 of [17] 
and Corollary A. 2 of [12]. □ 



Proof of Theorem 3.1. Let S n (k) be defined as in Lemma 6.4 and 
denote 

Q-l/2 k „ Q-V2 n 

Si(fc) = — ^J^ACAo) and S 2 „(£;) = — == £ A(A ). 

Let ^, 6 (0,0.5). By Lemma 6.1(a) and the continuous mapping theorem, 
max S n (k) — max |Si(&)| 2 

logn<fc</m log n<k<fm 

< max \S n (k) — \S\(k)\ 2 \ 

logn<k<fin 

= max ||S 2n (^)| 2 - |-Si(n)| 2 | 

log n<k<fj,n 



>r max 

0<T<At 



\B{\)-B{r) 



1-T 



\B(l) 



as n — > oo, where — >£ denotes convergence in distribution and {B(t) :t £ 
[0, 1]} is a standard Brownian motion. Thus, for any e > 0, 



(6.12) limsupP 

n— »oo 

as /i — ► 0. Similarly, as fj, — > 0, 



max S n (k) — max |S'i(fc)|'' 

logn<fc<M n 1°S n<k</in 



> € 



0, 



(6.13) limsupP 



max 5* n (A;) — max l-S^nC^)!' 

$n<n—k<fin logn<n—k<iin 



> e 



0. 



Denote Bi(jfe) = fT 1 / 2 Et=i Gu/^fe and -B 2 (A;) = VL~ 1 / 2 Et=L fe GVv'fc, where 
{Git} and {G 2 t} are defined as in Lemma 6.1. By Lemma 6.1(a), for each 
/z, we have 

max |Si(A;)| 2 — max 

logn<k<fm logn<k<fj,n 

< max Wl^kiMk) - B 1 (k)}\ {Bl{k ;] 1 t lS ) (k)l = o p (l), 
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as n — > oo. Furthermore, by Lemma 6.5(a), for each \i and x, it follows that 



(6.14) lim P 



max \Si{k)\ 2 — b n {m) 

rn<k<fm 



a n (m) < x ) = exp(— e x l 2 ). 



Applying the same argument to S%(k) = (kSl)~ l l 2 J2t=-k Dt(\o) and -^(fc) 
with the help of Lemma 6.1(b), and observing that maxi ogn <fc< Mn 1 51(^)1 2 
has the same distribution as maxi ogn < n ,_fc< Mn \S2n{k)\ 2 , we have 



(6.15) lim P 



max \S2n(k)\ 2 - b n (m) 

log n<n—k<^in 



a n {m) < x = exp(— e 



-x/2 } 



Using a similar method as for (5.2), we can show that 

si- 1 / 2 



A 



„ = max 

logn<n—k<fin 



\0.5-S 



{D t (\ ) - E[D t (\ )\F t _ k (t)]} 



( n " k ) t=k+i 

for some 5>0. Let S$ n (k) = Sl~ 1 / 2 £™ =fc +1 E[D t (A )|^_ fe (t)]/v / ^ r fc- So, 
max |<S , 2n(^)| 2 — max \S2 n (k)\ 2 

\ogn<n—k<iin logn<n—k<fin 



< max ||5 2n (A;)| 2 -|5 2 n ,(A;)| 2 | 

log n<.n—k<)j,n 



< A 2 + 2A n max 

log n<n—k<fm 



1 



(n — fe) 



0.5+<5 



E A(A ) 



t=fe+i 



Op(!)> 



where the last step holds by Lemma 6.1(b), the LIL and the strict station- 
arity of {Dt(Xo)}. Furthermore, by (6.15), for each ji and x, it follows that 



lim P 

n— >oo 



. max \S 2n (k)\ -b n (m) 

logn<n—k<[in 



a n (m) < x ] = exp(— e 



-x/2 } 



By (6.14), the above two equations and independence of maxi ogn < n _,! c < Mn 

\S2 n (k)\ 2 and maxi og n<fc</m |5i(^)| 2 i f° r each \i G (0,0.5) and x, it follows 
that 



P 



max< max |Si(£;)| 2 , 

\^logn<k<fin 

max |<S 2n (£;)| 2 }> - b n (m) 

logn<n—k<fj,n 

= exp(-2e- x/2 )+o(l). 



a n (m) < x 



Since a n (m) = 1 + o(l), by (6.12)-(6.13) and the preceding equation, we 
can show that, for each x and any e > 0, there exist N > and a constant 
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P 



max< max S n (k), max S n (k) > — b n (m) 

(jogn<fc</xon log n<n— k<fion J 



a n (m) < X 



- exp(-2e- x/2 ; 

By Lemma 6.1(a) and the continuous mapping theorem, we have 



e 

<2- 



max S n (k) — >£ max 

fiQn<k<n— Hon IM)<T<1— no 



|£(r)| 2 | |i?(l)-i?(r)| 2 a 



r 1 — r 

as n — ► oo. By the preceding two equations, for any x, we can show that 



lim P 

n— >oo 



max S n (k) — b n {m) 
kE [log n,n— log n] 



' a n {m) <xj= exp(-2e s / 2 ). 
Finally, by Lemma 6.4(a)-(b), the conclusion holds. □ 



7. Proof of Theorem 4.1. It is sufficient for Theorem 4.1 to verify As- 
sumptions 3.1-3.3. For simplicity, we only consider the case with p = q = 0, 
while the general case can be similarly verified. 

In this case, the following expansions hold: 



(7.1) 



lit 



^2c 0i et-i and e t (A) = (1 - B) d y t = ^aoi(A)y t _i, 



8=0 



i=0 



where c 00 = a 00 (X) = 1, c 0i = O(i~ 1+do ) and a 0i (X) = 0{i~ l - d ). We further 
have 



de t (X) 
dd 



d 2 e t (X) 
dd? 



\og\l-B){l-Bfy t and 



log(l - B)(l - B) d yt 
d 3 e t (X) 



dd 3 



log d (l-£)(l- Bf y u 



where log(l - B) = - J2Zi and lo g fc (! ~B){1- B) d = J2Zi a kiB l with 
a ki (X) =0{i- l ~ d \og k i) for k = 1,2,3. 

Proof of Assumptions 3.1-3.2. By Assumption 4.1, {y t } is strictly 
stationary with E\yt\ 2+I ' < oo. Since is compact, there exist constants d 
and d such that < < d< d< 0.5 and do £ (<ij d). Thus, sup AG @ |«fej(A)| = 
0(t -1_ ^log* i) for it = 0,1, 2, 3, and 



sup |et(A)| = sup 
e e 



i=Q 



<\yA+o{i)Y.^i\yt-iY 



i=i 
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Treating sup e |et(A)| and yt as elements in the L 2+L space, we have 



(7.2) 



sup|e t (A)| 
e 



2+t 



<o(i) 



\yth+L + Yl 



\yt-i\\2+L 

Al+d 



i=l 



< OO, 



that is, the first part of Assumption 3. 1 (i) holds. The proof of the second 
part and Assumption 3.1(h) can be found in [29]. Similar to (7.2), it can 
be proved that || sup e [\d 2 e t (\)/dd 2 \ + \d 3 e t (X)/dd 3 \]\\2+ L < oo. Thus, we can 
show that Assumption 3.1(iii)-(iv) holds. For a (large) integer K, 



sup 

e 



K 



£ t( x ) -^mWyt-i 



i=0 



<o(i) E 



2+t, 



i=K+l 



\yt-i\\2+L 



O 



1 



When k > i, by Lemma 2 in [35], it follows that 

OO 2 + L 



^t^- E[y t ^\r k {t)]\\lX[ = E 



E c 0j £ t-i-j 
j=k-i 



l+i/2 



o(D E 4, 

\j=k—i i 



o 



(k - j)(l-2do)(l+t/2) 



Let K = [k/2]. By the expansion of £t(A) in (7.1) and the preceding two 
equations, 

2+t 

sup\e t (\) - E[e t (\)\F k ( 



2+t 
K 



< 



°(^2>o)+ ( 1 ){e^pI^ 



<o 



1 



(A)|||y t _ i - J B[yt_ i |^(t)]|| 2+t 

2+t 

= o 



(t + l^+^ife-ijC 1 - 2 *)/ 2 



2+t 



1 



for some vq > 0. Using this with (7.2), we can show that Assumption 3.2(i) 
holds. Similarly, we can show that Assumption 3. 2 (hi) holds. Note that (1 — 
B) d °yt = St- By Lemma 2 in [35], 



\D t (Xo)-E[D t (Xo)\n(t)]\\lXl = a L E 



E 

i=k 



i 



2+t 



0(k 



-(2+t)x0.5i 



where a L = E\et\ 2+l ' < oo. Thus, 2v = 1. Uniformly in t, it follows that 



|i?[A(Ao)|^+i(t)] - E[D t (X )\T k (t)]\\ 2 2 X[ = a L E 



£t-k 



2+t 



O 



1 



fc 2+t ; • 
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Thus, we have that L\ = i and v\ = \. By the preceding two equations, we 
know that Assumption 3.2(h) holds. □ 



Proof of Assumption 3.3. Since y t = for t < 0, by (7.1), we have 



E 



sup|e 4 (A) - e f (A)| 
e 



E 



sup 
e 



^2a 0i (X)y t . 



i=t 



o(t- 2 ±) 



Using this with (7.2), we can show that Assumption 3.3(i) holds. Similarly, 
we can show that the first part of Assumption 3.3 (iii) holds. We now verify 
the second parts of Assumption 3.3(h) and 3.3 (iii) . Denote 

oo 

M = e*(Ao) - £t(Ao) = ^2a i(Xo)yt-i, 

i=t 

de t (X ) dstjXo) ^ 
Mt = — — — — = Z^Uli^OM-i, 



A 



it 



dd 

de t {X ) 
dd 



dd 



i=t 



E-£t-i) 



i=t 



where vt = — Y%=i £f-iA- We next make the decomposition 



A(A ) + A(A ) = e t (X )A lt + ^^-A t 



(7.3) 



dd 

A t A 2t - A t A u + A t v t + e t (X )A lt . 



By (7.1), we can write At as 



A t = ^2^2aoi(X )coj-iet-j = ^ 

i=t j=i j=t 



y^aoi(Ao)co ? -- 



i=t 



3- 



By Lemma 2 in [35], we can show that £?|^4t| is bounded by 



c E 



y^ a oi(Ao)c 0? -- 



-i 2\ l+t/2 



- j=t U=t 

{oo 
E 



E 

.i=t 



(J-i+ iy-doi^+do 



2n l+t/2 



where C is some constant independent of t. Furthermore, by Lemma A. 3 
with u = 1 — do and v = 1 + d$ , it follows that 

21 l+t/2 



(7.4) 



E\A t \ 2+L 



<o(i) 



■j=t 



1 



E ( jl—dofdo 



0(t 



-(l+t/2) ^ 
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Furthermore, we can show that 

(7.5) E\A lt \ 2+L = 0{{\og 2 t/t) l+L ' 2 ) and E\A 2t \ 2+l = 0(^ (1+t/2) ). 

By (7.3)-(7.5), we can show that ||A(A ) - A(A )||i +t / 2 = O^ 1 / 2 logi), 
that is, the second part of Assumption 3.3(iii) holds. 

Denote k = n — k. By (7.4)-(7.5), E\A t A it \ < (EA^EAfA 1 / 2 < 
( J B|At| 2+t ) 1 /( 2 + t ) ( J B|A it | 2+t ) 1 /( 2+t ) = 0(t -1 log 2 *). When * = 1,2, by Lemma 
A. 3 with u = 1/2 — 6 and v = 1 and the Cauchy-Schwarz inequality, we can 
show that 



(7.6) 



P max ^— — - \A t A it \>e 



< 



p E 



|A,A 



> e 



0. 



V S (n-t + l) 1 / 2 " 6 
Next, consider the third term in (7.3). We first make the decomposition 

n-l 



fc— 1 n 



1 



E v t A t = -J2~ E £ t-i A t-^2j E E t-i A t> 

t=k+l i=l t=k+l i=k t=i+X 

where the last term is obtained by exchanging order of J2t=k+i £*=& £ t-i A t/i- 
Since At is ^-measurable, by Lemma 2 in [35] and Minkowski's inequality, 
there exists a constant B, depending on i and E\et\ 2JrL , such that 



n 


2+t 


E £ t-i A t 


< B 


t=k+l 


2 + t 



E W 



|2+m2/(2+i) 



Li=fc+1 



l+t/2 



<o(i) E 7 



uniformly in i = 1, . . . ,n. Let 5 be small enough such that (1/2 — 25) (2 + i) > 
1. Since k > n — t + 1, by the Markov and Minkowski inequalities, we have 



P max 



1 



K g n <k<n k 1 / 2-5 
n-l 

<o(i) E 

k=g n 
n-l 

<o(i) E 

n-l 

<o(i) E 

fc=gn 



fc-1 



E- E 



i=l t=fc+l 



> e 



1 


fc(V2 


-<5)(2+t) 




1 


fc(V2 


-<5)(2+0 




1 


fc(V2 


-<5)(2+t) 



fc-1 1 

Si 

1=1 

■fc-1 1 

.4=1 

■fc-1 H 



E 

i=jfc+l 



2+t 



2+t 



E 

t=fc+l 



E E 

.t=l \t=k+l 



1 



2+t- 
l/2-i 2+t 



2+t 
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-i 1/2 \ 2+l 



n—1 ( k— 1 ^ 

- jfc(l/2-26)(2+0 1 E 7 



E 



1 



j (n-t + l) 25 t 



n-l 



i /A;-l i 1/2 \ 2+t 



k=g n ^«=1 

where the next-to-last step uses Lemma A. 3. Similarly, we have 
1 

i=k t=i+l 
1 



P max 

Vsn<fc<n k 1 / 2 - 5 

n-l 



n—1 ^ n 



> e 



<o(DEt 



E 7 E ^ 

n-l - 

EtI E 



k=g n 



jt(l/2-<5)(2+t) 



n—1 ^ / n j \ 1/2' 



i=fc * Vt=i+1 



2+t 



where we have used k>n — i>n — t + l. By the preceding two inequalities, 
we have 



(7.7) 



P max 



1 



(7.8) 



g n <k<n k 1 / 2 5 

Similarly, we can show that 

1 



E v t A * 



t=k+l 



>e = o(l). 



P max 



g n <k<n k l l 2 ~ & 



E 

t = A; + l 



>6 =0(1). 



Finally, by (7.3) and (7.6)-(7.8), we can show that the second part of As- 
sumption 3.3(h) holds. The first part of Assumption 3.3(h) can be similarly 
proved and, hence, the details are omitted. □ 



APPENDIX: LEMMAS A.1-A.3 
We state three lemmas here whose proofs can be found in [28]. 

Lemma A.l. If Assumptions 3.1(i) ; 3.2(i) and 3.3(i) hold, then for any 
T] > 

(a) lim PI max sup V[Z t (A) - f t (Ao)] + Vk > = 0, 



(b) lim P max sup V MX) - l t (\o)} + Vn - k > = 0. 

n^oo V 5 n<n-fc<n| A _ Ao |> r? ^ l J 
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Lemma A. 2. If the assumptions of Theorem 3.1 hold, then there exists 
a S > such that 



(a) max — 

g n <k<n n 



E 

t=i 



D t (X n (k))D' t (X n (k)) - Q 



(b) 



max 



(n - k) 6 



g n <n—k<n n 



Y, [Dt(\in(k))D' t (Xin(k)) - n] 

t-k+l 



o p (l). 



Lemma A. 3. For any u 6 (0, 1) and v G (0, oo), it follows that 

E fi- f + iw" =Q(1) riogJ - ^ w=1 ' 

t=r+iU t-i-^ ' ^ J -« r l- U) ifv>l, 

where 0(1) ZioZfis uniformly in j > r > 1. 
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